(2.4.3) IBM G5

Timothy J. Slegel, et al. IBM's S/390 G5 Microprocessor, IEEE Micro, Mar/Apr 1999, pp. 12-23. IEEE Xplore link

S/390 G5 processor : 1999 
     took 10 years to reach the performance of the last bipolar system for CMOS
     L2 ran at half speed (500Mhz core clock at 1.7V)
     full custom 

microarch
     not superscalar
     ESA/390 : older ISA :      numerous, relatively commonly used, instr that require tens/hundred of clk cycles to execute. 
                     not load-store
                     too complex : difficult to implenet : decimal data instr, addressing modes, multiple address spaces, precise interrupts VM emulation and 2 different floating point archs. 
     G5 uses millicode 




     L1 cache : buffer control element : cache itself, cache directory, TLB, adress translate logic
     I unit : instr fetch, decode, addr gen, queue of instr waiting
     E unit : exec, local working copy of registers
     R unit : recovery unit : checkpointed copy of the inter microarch state timing facility

L1 cache unified instr, opearnd, millicode data and is store through. 2 way interleaved

>> 256 bytes : ideal cache line size : compromise between fetch time of the last byte and perf improvement

Absolute address history table : predict preTLBed address 
TLB : dynamic address translation and access register translation 
ART : access register translation ART lookaside buffer (ALB) 

2 way set associative BTB 2048 entries
Decimal unit for financial data 

Exception handler : 
     I unit tries to find out which instructions cause an exception 
     does something called single instruction mode : all previous instructions are cleared and this particular instr is sent thru
     normal speed deduction : gross, pessimistic check
     single instr mode : actual precise check

Registers
     E unit has local copies
     R unit does have RAM type master copy (architectural state) of registers
     on commit, R unit is written (with ECC)
     R unit used for recovery (checkpoint)

Millicode : executing a complex instr is like hardwired subroutine call
     > uses completely indep set of register 
     > also service functions : hardware error logs, scrubbing memory for correctable errors, supporting operator console functions and controlling low-level I/Ooper
     
Virtual machine emulation
     hw support : 3 complete copies of all architected control register, 3 copies of timing facility registers. (host mode, first level guest and second-level)

2 types of floating point

symmetric multiprocessor : memory : uniform access time. 

lots of relaiabiltiy features
     error check, parity, state checking, local duplication of control logic and so on. 20-30% error correction logic!
     full duplicate I unit and E unit. if the signals dont match, hardware error recovery is invoked.
     in a array of processors : delete mechanism removes a processor completely.
     memory : redundant word-lines to automatically replace defective sections (even in customer site)
     Processor availability facility : scans out the latches from a check-stopped processor, stores it in a special area, os then resumes this on another processor
     PAF + concurrent sparing => completely transparent to customer, appatakar.